-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow Landing Requests #18807
Workflow Landing Requests #18807
Conversation
78f6520
to
d723581
Compare
This is interwound with the tool request API that I thought was closer than it was. I'm finding so many corner cases in modeling tools. If this becomes a blocker for anyone I think I could pretty easily pull out the database migration, the workflow landing stuff, the workflow API that adds Alternatively I could pull just the tool API & backend enhancements out and push them back into #17393 and we could keep all the modeling enhancements in here and treat this a bit more like the CWL branch where we have models we sync up but keep the plumbing out until it is solid. Otherwise I will just spend this week trying to work on the models and just spin some smaller stuff out here and there where it makes sense. |
64d2efa
to
27359a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some seriously cool stuff, thank you @jmchilton!
required: false | ||
desc: | | ||
Workflows launched with URI/URL inputs that are not marked as 'deferred' | ||
are "materialized" (or undeferred) by the workflow scheduler. This might be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it hard to do this in Celery ? Feels like a feature we can require celery for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of that pipeline could be celery-ified - given that this should work as is though that feels like a second iteration. Our workflow handlers "just work" now - they could be ... more decomposed for sure but it is a bigger project than we need for this feature I think. Workflow scheduling handlers never really materialized the way I wanted - it is maybe the most pointless I ever attempted to make something pluggable. I would be up for just scrapping it all for Celery. But again... future project I think.
if len(transform) > 0: | ||
dataset_source.transform = transform | ||
|
||
sa_session.add(hda) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good spot to check if we already have a dataset with same source, hash, transform and owner ? I think workflows with repeated inputs are going to be a thing (think cross product). This is just a comment to myself basically, not required for a first iteration.
And this is being repeated across all tests, so I think we keep trying to schedule if we don't also fail the invocation. |
Gotcha. I will redo this to fail the invocation on materialization error. |
Includes a lot of plumbing for tools but not the APIs - they are not ready to go yet.
@mvdbeek The workflow failure stuff is so cool - let me know if I used it right. I guess we might want a more specific "reason" code. |
if not self.__attempt_materialize(workflow_invocation, session): | ||
return None | ||
if self.app.config.workflow_scheduling_separate_materialization_iteration: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any measurable effect ? I would imagine materialization of even a small dataset takes so much longer than scheduling a workflow. As I understand this, we now spend a lot of time materializing a number of inputs (unless deferred), and then based on this setting we proceed (or not) to schedule the workflow.
We can always get more specific later on, that part is fine. I would totally like to merge this now, but I worry that we might make scheduling performance very unpredictable if we undefer in the scheduling loop. I don't think it's a ton of work to move that into celery, I'll give that a crack .. worst case we only allow deferred inputs in the landing API ? |
“Premature optimization is the root of all evil” -Tony Hoare. We should let it kill a handler before we worry - we have no clue if the feature will ever be used or if it would be a problem for the use cases we have in mind. Deferred data, download caches, no one using the feature... all might negate the value in that effort. |
I see this more from the angle, can one unprivileged user break scheduling for everyone else? My experience with these niche features is that they will eventually be used, and at that point admins will not know why their schedulers are stuck. |
I've written many niche features for Galaxy no one has ever used 😅. First sign of trouble and I will offer to redo it in Celery. |
I've added a little CLI tool for generating workflow landing requests URLs. External clients won't need to use this but it is a simple example with an easy to follow format and flow that describes how to use the APIs and generate custom forms with pre-populated data for users.
This pairs well with the new
{src: "url",...}
data input dictionaries and a potential minimal workflow running UI (pictured below).The UI piece is clearly an MVP - hopefully a tool form UI expert can take it over.
How to test the changes?
(Select all options that apply)
License